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ABSTRACT 

We discuss a Bayesian approach to the analysis of radial velocities in planet searches. 
We use a combination of exact and approximate analytic and numerical techniques to 
efficiently evaluate \ 2 for multiple values of orbital parameters, and to carry out the 
marginalization integrals for a single planet including the possibility of a long term 
trend. The result is a robust algorithm that is rapid enough for use in real time anal- 
ysis that outputs constraints on orbital parameters and false alarm probabilities for 
the planet and long term trend. The constraints on parameters and odds ratio that we 
derive compare well with previous calculations based on Markov Chain Monte Carlo 
methods, and we compare our results with other techniques for estimating false alarm 
probabilities and errors in derived orbital parameters. False alarm probabilities from 
the Bayesian analysis are systematically higher than frequentist false alarm probabili- 
ties, due to the different accounting of the number of trials. We show that upper limits 
on the velocity amplitude derived for circular orbits are a good estimate of the upper 
limit on the amplitude of eccentric orbits for e < 0.5. 
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1 INTRODUCTION 

The analysis of a set of radial velocities in planet searches 
typically involves a number of different steps. First, the 
best fitting Keplerian orbital parameters are found by min- 
imizing x 2 , for example with a Levenberg-Marquardt algo- 
rithm (Press et al. 1992). Because of the complex multi- 
modal shape of the x 2 distribution in parameter space, a 
Lomb-Scargle periodogram (Lomb 1978; Scargle 1982) is of- 
ten used beforehand to fit circular orbits at a range of or- 
bital periods, providing starting points for the Keplerian fit. 
Then the reality of the signal is assessed by calculating the 
false alarm probability (FAP) that the observed signal could 
arise due to noise fluctuations, typically using Monte Carlo 
simulations (Marcy et al. 2005; Cumming 2004). This has 
become particularly important as radial velocity surveys re- 
veal planets with lower velocity amplitudes, comparable to 
the measurement uncertainties and other sources of noise. A 
related question is comparing different models for the data, 
for example deciding whether a two (or more) planet model 
is preferred over a single planet model, or whether to include 
a long term trend due to a long period companion (see for 
example, Robinson et al. 2007). 

Uncertainties in the fitted orbital parameters are then 
calculated. A common technique is to scramble the residuals 
to the best fit Keplerian orbit, add them back to the pre- 
dicted velocity curve, and refit the orbit. After repeating this 



many times, the distribution of fitted parameters gives an 
estimate of the uncertainty (Marcy et al. 2005). Another 
approach is to use Bayesian methods implemented with 
Markov Chain Monte Carlo (MCMC) simulations (Ford 
2005). In the case of a non-detection, the upper limit on the 
planet mass as a function of orbital period is an important 
input for population studies (Walker et al. 1995; Cumming 
et al. 1999; Endl et al. 2002; Wittenmyer et al. 2006). 

Often, all of these steps must be carried out for a given 
radial velocity data set. Many of them are based on Monte 
Carlo simulations involving fitting Keplerian orbits to syn- 
thetic data sets. These trials can become cumbersome for 
the large numbers of orbital frequencies that must be con- 
sidered. For this reason, recent calculations have produced 
upper limits for circular orbits only (Cumming et al. 2008), 
relying on the fact that detectability of planets falls off with 
eccentricity only for e > 0.6 (Endl et al. 2002; Cumming 
2004), or on a sparse grid of orbital period values (O' Toole 
et al. 2008). 

We focus in this paper on a Bayesian approach to the 
analysis of radial velocity data. The advantage is that, in 
principle, a Bayesian analysis answers all of the above ques- 
tions with a single calculation, providing constraints on 
model parameters and odds ratios which can be used to 
decide which model best describes the data (Ford 2005; Gre- 
gory 2005b; Cumming 2004). This would simplify analysis 
of radial velocity data sets. The difficulty in practice is that 
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the marginalization over parameters requires the evaluation 
of multidimensional integrals over parameter space. 

Bayesian methods have been applied to planet searches, 
using sophisticated Markov Chain Monte Carlo (MCMC) 
techniques to evaluate the integrals. Ford (2005) applied this 
technique to determining the constraints on orbital param- 
eters, while Gregory in a series of papers (Gregory 2005b, 
2007a, b) also considers model comparison. Ford (2006) in- 
vestigated different proposal distribution functions to help 
speed convergence of MCMC chains. Whereas the chains 
used by Ford (2005) were 10 6 -10 10 steps in length, Ford 
(2006) found that it was possible to achieve convergence 
after only 10 4 -10 6 steps by optimizing the directions in pa- 
rameter space in which steps are taken. Recently, Balan & 
Lahav (2009) have developed an MCMC code Exofit based 
on the methodology of Ford & Gregory (2007) which is pub- 
licly available 1 . 

Despite this tremendous progress, the application of 
MCMC to radial velocity data is not yet routine, although it 
is commonly used to assess uncertainties in orbital param- 
eters. As well as optimizing the steps in parameter space, 
one important difficulty in using the MCMC approach is as- 
sessing whether the chains have converged. Another is that 
when the signal to noise ratio is low and the distribution 
of x 2 m parameter space is multimodal, the MCMC chain 
may miss minima in \ 2 ■ Gregory (2005b) introduced a par- 
allel tempering scheme in which several MCMC chains are 
run simultaneously, each with a different temperature, hot- 
ter chains making larger jumps in parameter space, colder 
chains exploring local minima. As the calculation progresses, 
the chains exchange information in a way that preserves 
their statistical character. This scheme has been successfully 
applied to multiple planet systems (Gregory 2007a, b). 

In this paper, we take a different approach. We consider 
models with one planet only, or one planet plus a long term 
linear trend, and use a combination of grid-based numerical 
evaluation and exact and approximate analytic methods to 
evaluate the marginalization integrals. The idea is to look 
for ways in which the marginalization integrals can be eval- 
uated more efficiently. As well as providing a useful tool for 
analysing radial velocity data for single planet systems, it 
provides a check on the output of MCMC simulations, and 
may have application to making MCMC codes for analysis 
of multiple planet systems more efficient. 

We start in §2 with an overview of the Bayesian ap- 
proach, including how to write down the posterior probabil- 
ities for orbital parameters, and how to use them to calculate 
false alarm probabilities. In §3, we discuss circular orbits, 
using analytic techniques to evaluate the marginalization 
integrals. In §4, we divide the parameters for eccentric or- 
bits into fast (linear) and slow (non-linear) parameters, and 
use the analytic techniques for circular orbits to marginalize 
over the fast parameters. In §5, we compare our results to 
MCMC calculations, and traditional methods for evaluat- 
ing false alarm probabilities and upper limits on companion 
mass. 



1 Available at http://www.http://zuserver2. star. ucl.ac.uk/ la- 
hav/exofit.html . 



2 OVERVIEW 

Bayesian analysis of radial velocity data has been discussed 
previously by several authors (Ford 2005, 2006, 2008; Ford & 
Gregory 2007; Gregory 2005a,b, 2007a,b; Cumming 2004). 
Here we give a brief reminder of the basic ideas and intro- 
duce our notation, and show how the systemic velocity and 
noise uncertainty can be analytically marginalized. 



2.1 Parameter estimation 

We start with a model for the radial velocities with set of 
parameters a. For example, a single Keplerian orbit has six 
parameters a = (K,P,e,u! p ,t p ,^), where K is the velocity 
amplitude, P the orbital period, e the eccentricity, u> v and 
t p are the argument and time of pericenter, and 7 is the 
systemic velocity. The data consist of a set of N measured 
velocities w», observation times U, and errors a t . Bayes' the- 
orem allows us to calculate the probability distribution of 
the parameters a given the data, also known as the poste- 
rior probability of a, 



V(a\d) 



P(a)P(d\a) 
P(d) ■ 



(1) 



where P(d) is a normalization factor. The term P(a) is the 
prior probability distribution for the parameters a, which 
allows us to specify any knowledge of the parameter distri- 
bution that we have before the data are taken. If the errors 
are Gaussian-distributed and uncorrelated, the likelihood of 
the data, or probability of the data given a particular choice 
of model parameters is 



P(d\a) 



1 



exp 



X 2 {a) 



where 



X 2 (a) 



N 

E 



Wi (Vi - Vi(a)Y 



(2) 



(3) 



is the usual \ statistic, written in terms of weights Wi = 
1/af . We write the model velocity at time U as Vi. 

Often, we are interested in the probability distribution 
of a single parameter, or a subset of parameters. For exam- 
ple, a circular orbit has a — (K, P, 7, <f>), where K is the 
velocity amplitude, P the orbital period, 7 the systemic ve- 
locity, and (j> the orbital phase. It is likely that we are not 
interested in the particular values of 7 or <j>, but want to con- 
strain the orbital period and velocity amplitude. The joint 
probability distribution for P and K can be obtained by 
marginalizing over the other parameters, 



V{P,K\d)= / # / d 7 V(P,K,ct> n \d). 



(4) 



Marginalization amounts to performing a weighted average 
of the probability distribution over the unwanted parame- 
ters. 

A number of other useful quantities can be obtained 
from P(P, K\d). Further integration over K gives V(P\d), or 
integration over P gives P(K\d). A confidence interval for K 
can be calculated from P(K\d). For example, if a planet is 
not detected in a given data set, an upper limit can be placed 
on the amplitude of undetected orbits. The 99% upper limit 
K 99 is given by f* 99 dKV{K\d)/ dKV{K\d) = 0.99. 
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For eccentric orbits, we will focus in this paper on ob- 
taining V(P,e,K\d) by marginalizing over 7, lo p , and t p . 

2.2 The noise distribution 

In equation (2), we assumed that the standard deviation 
of the noise for each observation cr» was given. In reality, 
other noise sources may be present in the data that hinder 
the identification of planetary signals, for example intrinsic 
stellar "jitter" (e.g. Wright 2005) due to rotation of spots 
across the surface of the star, or changes in line profiles 
over time related to magnetic activity. This extra noise can 
be incorporated as an additional parameter of the model. 
A common choice (Gregory 2005b; Ford 2006) is to add the 
extra noise term in quadrature with the measurement errors 

Here, we instead multiply each value of a, by a noise 
scaling factor k (Cumming 2004; Gregory 2005a), and ana- 
lytically marginalize over k (e.g. Sivia 1996) 



V(d\a) oc 



r dk 1 / 



2k 2 



(X 2 {a)) 



-N/2 



(5) 



The constant prefactor, which depends only on the weights 
Wi and the number of observations N, does not affect the 
shape of the posterior probability distributions, and cancels 
out when we calculate odds ratios. Therefore, we drop it and 
replace equation (2) with 

-N/2 



V(d\a) = ( X 2 (a)Y 



(6) 



This is a Student's t-distribution rather than Gaussian dis- 
tribution (Sivia 1996). 

We take an infinite range for k in equation (5), whereas 
in reality we likely have some information about the uncer- 
tainty in the noise level. For example, we may be able to 
estimate the size of the expected stellar jitter based on stel- 
lar properties (Wright 2005). Alternatively, we could keep k 
as a parameter, and evaluate the constraints on k from the 
data, V{k\d) (e.g. Ford 2006). We have tried marginalizing 
numerically over k with finite limits, and find that the re- 
sults for realistic ranges of k are close to the analytic case 
with infinite limits. Therefore we marginalize analytically 
over k and adopt equation (6) as the likelihood throughout 
this paper 2 . 

2.3 Priors 

The choice of appropriate prior probabilities V(a) for the 
various parameters has been discussed in depth in the lit- 
erature (e.g. Gregory 2005b; Ford & Gregory 2007). We 
mostly follow this previous work. For circular orbits, we 
use uniform priors for 7 and (f>, and priors for K,P that 
are uniform in log (the Jeffreys prior). For eccentric or- 
bits, we take uniform priors in 7, t p , ui p , e, and log-uniform 
priors in K, and P, i.e. T(P) = l/(Plog(P 2 /Pi)) and 
V(K) = 1 / (K \og(K2 / K\)) . If a long term trend is included 
in the model (a linear term f3ti\ see Appendix A), we take a 
uniform prior in the slope /3. 

2 The techniques we develop below for rapidly marginalizing over 
parameters can also be applied to the case where k is kept as a 
parameter. This is discussed in Appendix B. 



In fact, Gregory (2005b) and Ford & Gregory (2007) 
use a modified Jeffreys prior for the noise term and the ve- 
locity amplitude rather than the standard Jeffreys prior. A 
modified Jeffreys prior is uniform in log above some scale, 
and uniform below that scale. For radial velocity amplitude 
K or extra noise term, the turnover scale is taken to be 
£s 1 m/s. The values of K we are interested in are typically 
larger than this, and so for simplicity we use a Jeffreys prior 
between our lower and upper limits in K. The ranges that 
we take are K — 1 m/s to 2 At;, where Av is the observed 
range of velocities, and P — 1 day to the time span of the 
observations. 



2.4 Model comparison and the false alarm 
probability 

Marginalizing over all the parameters of a model gives 
the total probability of that model. For example, given 
V(P, K\d) for circular orbits, we could calculate the total 
probability that a planet is present 



V(l\d) 



dK V(P,K\d). 



(7) 



Similarly, by considering a model without a planet, we can 
calculate the probability that no planet is present given the 
data, P(0\d). We define the normalization V{d) in equation 
(1) so that the sum over the probabilities of all models is 
unity. For example, if we consider only two possible models, 
that there is or is not a planet present, we choose V{d) such 
that 



V(l\d)+V(Q\d) = 1. 



(8) 



We can think of the posterior probability that there is 
no planet present P(0\d) as the false alarm probability. It 
can be written without including the P(d) factors explicitly 



F = V{0\d) = 



1 + A 

where A is the odds ratio 
V{l\d) 



A = 



V(0\d) 



(9) 



(10) 



(the normalization factors P(d) cancel out when the ratio is 
taken). For A»1,F« A -1 . The odds ratio is 



_ JdKfdP V{P,K\d) 
for circular orbits, or 



A = 



JdKjdPjde V{P,K,e\d) 
V(0\d) : 



(11) 



(12) 



for eccentric orbits. 

This approach can be generalized to more than two 
models. For example, later we will consider four possible 
models for a given star, the possible combinations of includ- 
ing or not including a Keplerian orbit with period less than 
the time span of the observations, and including or not in- 
cluding a long term trend. To calculate the false alarm prob- 
ability associated with the short period planet, we define the 
odds ratio 
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A = 



P(l\d)+P(l,t\d) 
T(0\d)+T(0,t\d) 



(13) 



Vi = 7 + K sin (wtj + c 



(17) 



where 1 or indicate that the short period planet is or is 
not included in the model, and t indicates that a long term 
trend is included. 



2.5 Probability that there is no planet V(0\d) 

The posterior probability of no planet V(0\d), where the 
"no planet" model is a constant velocity Vi = 7, can be 
calculated analytically. Using the likelihood of equation (6) 



V(0\d) = 



V(d)A-y 



dy (x 2 (i)) 



-N/2 



(14) 



where A7 = 72—71 is the range of values of 7 considered, 
and we assume a uniform prior for 7 in that range. Mini- 
mizing x 2 with respect to 7, we find the best-fitting value 
70 = ^WiVi/ ^Wi. In terms of 70, we can write 



X (7) 



X 2 (7o) + (7- JoY^Wi. 



(15) 



The fact that the distribution of X 2 {l) i s analytic is men- 
tioned in Ford (2006). For clarity, we drop the subscript i on 
the sum in equation (15) and in the remainder of the paper, 
a sum over the observations with i running from 1 to iV is 
implied. 

The quadratic form of \ 2 allows the integral over 7 to 
be carried out analytically when the limits 71 — > —00 and 
72 — » 00. In that limit, the normalization factor diverges, 
A7 — > 00. However the values of A7 cancel when we form an 
odds ratio, as does the normalization factor V{d). Therefore, 
we can drop the prefactor after integrating, giving the final 
result 



V{Q\d) = ( X 2 (7o)) 



-(N-l)/2 



(16) 



An alternative "no planet" model is a linear trend in the 
radial velocities over time, Vi = 7+/%,. A linear term is often 
included (and needed) in radial velocity fits to account for 
additional companions with long orbital periods. A similar 
formula for V(0\d) can be derived in that case. For clarity, 
we leave this to Appendix A, along with how to add a linear 
term to the circular and Keplerian orbit fits, and consider 
only the constant velocity no planet model in the main text. 



3 CIRCULAR ORBITS 

In the previous section, we saw that a calculation of P(a\d) 
followed by successive marginalization provides constraints 
on all model parameters and a measure of the false alarm 
probability. The difficulty in practice is in performing the 
integrals over parameter space. We first consider circular 
orbits, which have a simple sinusoidal velocity curve, and 
introduce some analytic approximations that allow us to 
rapidly carry out these integrals. Apart from being a testing 
ground for these techniques which we will then apply to ec- 
centric orbits, fitting circular orbits is actually quite useful 
since sinusoid fits are sufficient to detect orbits even with 
moderate eccentricities (e < 0.5; Endl et al. 2002; Cumming 
2004). 

For a circular orbit, the model for the velocities is 



which has four parameters: 7 is the systemic velocity, K is 
the velocity semi-amplitude, cf> the phase and w = 2n/P 
the orbital frequency, P is the orbital period. Our aim 
in this section is to obtain V(P,K), marginalizing over 7 
and 4>. We first marginalize analytically over 7 to obtain 
V{4>, K, P\d), and then present two different methods for ef- 
ficiently marginalizing over the parameters K and <j>- The 
methods are summarized and applied to an example data 
set in §3.5. 



3.1 Analytic marginalization of the systemic 
velocity 

To integrate over 7, we note again that \ 2 depends quadrat- 
ically on 7 around the best-fit value, as given by equation 
(15), where this time 70 (<^, K, P) is the best fit systemic ve- 
locity at each (f>, P and K, that is 70 (<f>, K, P) is the value of 
7 that minimizes \ 2 at each <j>, P and K, and x 2 (lo) is the 
corresponding minimum value of \ 2 ■ The best-fit systemic 
velocity can be calculated from d\ 2 /&y = 0, giving 



7() = W: 



[vi — K sin (cuU + 4>)] I '1 



(18) 



Adopting a uniform prior for 7 and integrating for A7 — > 00, 
we find 

P(d\<f>,K,P) = ( X 2 ho,<j>,K,P]y (N - 1)/2 , (19) 

where 70 (<j>, K, P) is given by equation (18), and we have set 
the prefactor equal to unity as in §2.5. 



3.2 Evaluation of V{<j>, K, P\d) on a grid 

Next, we describe a method for rapidly evaluating 
V(4>, K, P\d) numerically for a grid of values of <f>, K, and 
P. We introduce the averages 



(v) 


= wjVj/ Wj 


(C) 


= Wj cos(cuti)/ Wj 


(S) 


= ^ Wj sin(arfi)/ ^ Wj 


(vC) 


= WjVj cos(a>ti )/ y^wj 


(vS) 


= WjVj sin(utj)/ Wj 


(C 2 ) 


= Wj cos 2 (wu ) / Wj 


(S 2 ) 


= Wj sin 2 (u>tj ) / Wj 


(SC) 


= Wj cos(cjti) sin(utj)/ w. 



(20) 

In this notation equation (18) can be written 70 = (v) — 
K(C) sin (f) — K{S) cos (j). Substituting this expression into 
\ 2 and simplifying, we find 

x2{ ^ K ' P) = {{v 2 }} - 2K {((vC)) sin 4> + {{vS}} cos 0] 

+K 2 [{{C 2 )) sin 2 4, + {{S 2 }} cos 2 4, 

+2{{SC))sm(t>cos(t>\ , (21) 

where = ((/ - {f))(g - (g))) = (fg) - (f){g). 



© 2009 RAS, MNRAS 000, 1-16 



Analysis of Radial Velocities 5 



Equation (21) allows efficient calculation of x for multi- 
ple values of the parameters 4>,K and P. Given three vectors 
— a vector of K values, a vector of <f> values (and correspond- 
ing values of sin <p and cos (j>), and a vector of orbital periods 
and the corresponding averages over the data (terms in an- 
gle brackets) — a 3-dimensional matrix of \ 2 values can 
be quickly generated. The advantage is that the sums over 
the data need to be calculated only once, rather than being 
reevaluated for each new choice of K and <fi. 

Marginalizing over <f> is then straightforward, since the 
integral 

1 f 2n 

V{d\K,P) = —J d(pV{d\(p,K,P) 



= I jjd* (x^,K,P))- (N -^ 



(22) 



can be calculated using a quadrature method based on the 
values of cf> in the grid. To calculate the odds ratio, we 
should compare this with the probability for a no planet 
model, which has Vi = 7 only. In this case, 70 = {v), and 

xo/J2 w ' = « w2 )>> so that 

/ , ^ x-(JV-l)/2 

which can be used in equation (11) for A. 



3.3 Analytic marginalization of <j> and K 

The reason that we could analytically integrate over 7 is that 
the model Vi is linear in 7. Now in fact, we can perform 
a similar analytic integration over K and <j> by rewriting 
equation (17) in terms of the linear parameters A and B, 



Vi — 7 + A sin uit i + B cos uiti 



(24) 



where A — Kcos(f> and B = Ksm<f). In seminal papers 
on Bayesian signal detection, Bretthorst (1988) carried out 
analytic integration over A and B, and we follow the same 
approach here (see also Ford 2008). 

To perform the integration, we use the fact that the 
quadratic shape of \ 2 that we found for 7 (eq. [15]) gener- 
alizes to an arbitrary linear model Vi = ^2 k akgk(U). It is 
straightforward to show that 3 



X 2 (a) = X 2 ( a "o) + 8a- a - Sa 



(25) 



where the matrix a is the inverse of the correlation ma- 
trix (Press et al. 1992), and has components a kl — 
(l/2)(<9 2 x 2 /ddkdai) = ^2wigk(ti)gi(ti). Tne marginaliza- 



3 There is an approximation known as the Laplace approximation 
(Sivia 1996) in which the quadratic form in equation (25) is as- 
sumed close to the mimimum \ 2 value. Ford (2008) applied this 
approximation to circular orbit fits at specified orbital periods, 
but in fact as we have noted here the approximation is exact in 
this case because the model is linear. We have tried applying the 
Laplace approximation to carry out the integral in <p in eq. [22]. 
However, we find that this approximation does not perform well 
at low K, where V(d\cp) is bimodal, and in addition is not con- 
venient numerically as it requires a search for the peak in V(d\<f>) 
at each value of K. 



tion integral with uniform priors for the parameters can be 
done analytically 4 



J d m a ( X 2 )- 



N/2 



(x§) 



i/2p 1 



Vdet a 



r(f) 



(26) 



where m is the number of parameters integrated over. We 
use subscript zero to indicate the best fit value of parame- 
ters, or the corresponding minimum value of \ 2 ■ 

Applying this result to the integration over A and B 
gives 



V(P\d) 



dAdBdy 
AAABAy 

1 (x§) 



( X 2 (A,B, % P))- N/2 



(27) 



PAAABA7 VdeF^ r(f) 

The values of Xo and det a can be calculated as a func- 
tion of P as follows. First by minimizing x 2 with respect 
to A, B, and 7, the best fit values of 7, A = Kcos<f> and 
B — Ksin<t> are 



B, 



((vS))((C 2 )}~({vC))({SC)) 

({&))({&))- {{SC))* 
((vC})({S 2 }}-((vS}}{(SC)} 



«C 2 »«S2»-«SC»2 
70 = (v) - A (S) - B (C) 

and the minimum value of x 2 is 
^ = «« 2 » - 2A ((vS)) - 2B {(vC)) 
+Al((S 2 )) + Bl((C 2 )) + 2A B {(SC)) 



(28) 

(29) 
(30) 



(31) 



((S 2 ))((C 2 ))-((SC)) 2 



(32) 



and 
det a 

This allows us to easily calculate P(P\d). 

The only remaining question is what to choose for the 
prior ranges AA and AB (the prior range in gamma A7 can- 
cels when we form the odds ratio). Unfortunately, the ana- 
lytic evaluation of the integral in equation (27) is only possi- 
ble for a uniform prior in A and B. Since dAdB — KdKd(f>, a 
uniform prior in A and B corresponds to a prior V(K) oc K 
rather than the logarithmic prior V(K) ccl/K that we as- 
sumed in the grid-based calculation (see discussion in Bret- 
thorst 1988 who chose a different prior to Jaynes 1987). 
Therefore the analytic marginalization gives more weight 
to large K solutions, whereas the grid based approach gives 
more weight to small K solutions. We correct for this in an 
approximate way by choosing the normalization appropri- 
ately. We find that the choice 



AAAB = Ko(P)K 0}&v logftfa/tfi) 



(33) 



where ifo,av is the best fit velocity amplitude averaged 
over all frequencies reproduces the normalization of the grid 
based calculation, with final odds ratios typically within a 
factor of 2. 

4 To prove equation (26), follow the method given in the Ap- 
pendix of Sivia (1996), where a similar result is derived for a 
likelihood oc exp(— x 2 /2). 



© 2009 RAS, MNRAS 000, 1-16 



6 A. Gumming and D. Dragomir 



3.4 The probability distribution of K at each 
orbital period 

Analytical marginalization over the linear parameters A and 
B is convenient, but in doing so, we have thrown away infor- 
mation about the velocity amplitude K. It turns out that we 
can get it back very easily using an analytic approximation 
for the shape of V(d\K) due to Jaynes (1987) 5 . The idea is 
to assume the parameters A and B are uncorrelated 6 , giving 



V(A,B\d) oc exp 



(A-A ) 2 (B~B f 



2a\ 



2a 2 



(34) 



where A§ and Bo are the best fit values, and a a and as 
are the errors in determining A and B from the data. Now 
writing A = K cos cj> and B = K sin <f>, we find 



V(K, 4>\d) oc exp 



K 2 
2a 2 K 



j. KK ° ( 
H s — cos( 



(35) 



where 4>o is a constant that can be determined (the pre- 
cise value is not important here), the best fit amplitude is 
Kq = (A 2 , + £?o) 1//2 , an< i we assume a\ = a 2 B = a 2 K . If the 
variance of the noise is s 2 , we expect to be able to determine 
the amplitude K to an accuracy a 2 K w 2s 2 /N. Using this ap- 
proximation, together with the integral representation of the 
modified Bessel function 

(■2tt 



gives 



2tt 



V(K\d) oc exp 



dt e z 



NIC 
As 2 



\ 2s 2 ) 



(36) 



(37) 



Since we want a prior for K of V(K) oc 1/K, we divide the 
area element dAdB = KdKd(f> by K 2 , giving the final result 



V(K\d)dK oc exp 



NIC 
4s 2 



T ( NKKo \ dK 



K 



(38) 



Comparing to the results of our grid search, we find that 
equation (38) reproduces the distribution of K values at each 
orbital period remarkably well. We estimate s 2 as the mean 
square deviation of the residuals to the best fit sinusoid, or 
s 2 (P) = Xo{P)/ X/ Wi - W e normalize the distribution of K 
at each P so that J~P(K,P\d)dK = V(P\d), where V(P\d) 
is determined from the analytic marginalization over A and 
B (eq. [27]). This choice of normalization as a function of P 
gives the best agreement with the grid code. 



3.5 Summary and example 

Let's summarize the main results of this section. We have 
discussed two methods for evaluating P(P, K\d) for circular 

5 We follow a slightly different argument than Jaynes (1987), 
but with the same spirit. The same approach was used by Groth 
(1975) to derive the statistical distribution of periodogram pow- 
ers in the presence of a signal plus Gaussian noise, and recently 
Shen & Turner (2008) made a similar approximation to derive 
the shape of the probability density for eccentricity in a Keple- 
rian orbit fit. 

B This is a good approximation for large N . The covariance be- 
tween A and B is oc ^ Wi sin ujti cos u)ti which averages to zero 
for large iV. 




K (m/s) 



Figure 1. Results of circular orbit fitting to data for HD 4203, 
using analytic marginalization over K and 4>, and reconstruct- 
ing V(K) using equation (38) (red curves) and by calculating 
P(4>, K, P) on a grid (black curves). V{K) in the third panel is 
normalized such that each curve has the same area beneath the 
curve. The bottom panel compares the P(K) obtained for periods 
420.1 days (solid curves, close to the best-fitting frequency) and 
19.0 days (dot-dashed curves, no significant fit at this frequency) 
for the analytic and grid-based approaches. 



orbits. First, equations (21) and (22) can be used to calculate 
\ 2 (P, K, <j>) f° r many different values of P, K, and (f>, and 
from there V(P,K\d) obtained by integration over <f). No 
approximations are made in this approach, which we refer 
to as the "grid-based approach" . Second, equations (27) to 
(33) provide a method for evaluating P(P\d) using analytic 
marginalization over the linear parameters A and B and 
therefore K and (f>. The analytic marginalization requires 
that we assume a prior P(K) oc K rather than 1/K, but 
by choosing the normalization appropriately (eq. [33]), we 
approximately recover the results corresponding to V(K} oc 
1/K. Next, given the best fit amplitude K = (A 2 , + B$) 1/2 
at each period, V(P,K\d) can be calculated for a grid of 
K values using the analytic approximation of equation (38). 
We refer to this second approach as the "analytic approach" . 

As an example, we consider the 23 radial velocities for 
the star HD 4203 made available in the Butler et al. (2006) 
catalog of nearby exoplanets (see Vogt et al. 2004 for the 
original discovery of this planet). The orbital parameters 
given by Butler et al. (2006) are P = 431.88 ± 0.85 days, 
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K = 60.3 ± 2.2 m/s, and e = 0.519 ± 0.027. They also 
include a linear long term trend of — 4.38±0.71 m s _1 yr _1 . 
The rms of the residuals to this solution is 4.1 m/s. 

We use both of the techniques described above to fit 
a circular orbit plus constant to this data. We consider 
orbital periods between 1 day and the time-span of the 
data, T = 2000 days. We evaluate ANf frequencies, where 
Nf — (A/)T is the estimated number of independent fre- 
quencies in the frequency range A/ (Cumming 2004). The 
values of K considered range from 1 m/s to 2 At), where Av is 
the range of the measured velocities. For the grid-based ap- 
proach, we find that the typical time required on a 2-3 GHz 
CPU is ~ 10~ 7 s per set of parameters (4>,K,P), so that 
for example 3000 periods, 100 values of K and 30 phases, 
or 10 7 total combinations, takes 1 s to evaluate. For <j>, we 
align the grid with the best fit phase 4>o at each P. In this 
way, we guarantee that the best fit value of <f> is included on 
the grid, which reduces the number of grid points we need 
to use in <j>. The analytic marginalization technique requires 
~ 5 x 10~ 7 s per P and K value, so that a search of 3000 
periods, keeping track of 100 values of K takes ~ 0.1 s. We 
use the routine bessiO from Press et al. (1992) to calculate 
the Bessel function in equation (38). 

Figure 1 compares the two techniques. The red curves 
show the results of the analytic marginalization, the black 
curves show the results of the grid-based calculation. The 
false alarm probabilities are 0.14 (grid) and 0.060 (analytic) 
(odds ratios 6.3 and 16 respectively). The distribution of K 
agrees well between the two techniques, although the prob- 
ability curve is shifted to larger values of K for the analytic 
approach compared to the grid approach, consistent with 
the different priors. The false alarm probability ~ 0.1 means 
that this would not count as a detection. This is an example 
of a case in which the large eccentricity e > 0.5 prevents 
detection by fitting circular orbits. The best fit amplitude 
K « 30 m/s for circular orbits is significantly smaller than 
for the Keplerian orbit fit of Butler et al. (2006). Using 100 
values of K between 1 m/s and 60 m/s, we find the 99% 
upper limit on K is 41.2 m/s (analytic) or 41.3 m/s (grid). 
The bottom panel in Figure 1 compares the probability dis- 
tribution of K at two different periods obtained from the 
grid-based approach and the analytic approach. This shows 
that equation (38) reproduces the distribution from the grid- 
based calculation well. 



4 ECCENTRIC ORBITS 

We now consider full Keplerian fits to the data. The tech- 
niques we developed in the previous section for circular or- 
bits can be readily applied to Keplerian orbits, because the 
Keplerian model is linear in a subset of parameters which 
can therefore be treated analytically, as we now describe. 



where K is the velocity amplitude, e is the eccentricity of the 
orbit, lo p is the argument of periastron 7 . The true anomaly 
9 is a function of the time t and the three parameters c, P, 
and t p , where t p is the time of periastron passage (acting 
as an overall phase for V(t)). To calculate 6(t;e, P,t p ), we 
must solve the relations 



tan (D = (S) 1 tan (f) 



E-esinE — M — -^(t-t P ) 



(40) 
(41) 



where E is the eccentric anomaly, and M the mean anomaly. 

The first point to note is that the six orbital parameters, 
a = (7, K, u)p, P, e, t p ) can be divided into two groups, "slow" 
and "fast" parameters, a s = (P, e, t p ) and a} = (7, K, u p ) 
respectively. Each time we change a value of the slow param- 
eters, we must re-solve equations (40) and (41) to calculate 
the values of 9, whereas when we change a value of the fast 
parameters only we do not need to recalculate the values of 
9. This is reminiscent of the division into fast and slow pa- 
rameters in analysis of CMB data (e.g. Lewis & Bridle 2002; 
Tegmark et al. 2004). We can use this division to increase 
the speed of the parameter search. 

For a given set of the slow parameters, we can find the 
best fitting fast parameters with a linear least-squares fit, 
since we can write 



V = A sin 9 + B cos 9 + 7 



(42) 



with A = —K sinujp, B = K coslo p , and 7 = 7 + Kecosuj p . 
A linear least-squares fit returns the best-fitting values of 
A,B, and 7, and therefore K {K 2 = A 2 + B 2 ), lo p (tancj p = 
—B/A), and 7. This halves the number of parameters that 
we need to search to find the best-fitting solution. 

The fact that the fast parameters a} can be obtained 
from a linear fit means that we can directly apply the tech- 
niques we developed for circular orbits in §3 to marginal- 
ize over them. For the grid-based approach, equation (21) 
should be replaced by 

X 2 (K, Lo p ) = + 2R _ ^ vC ^ cog ^ 



+K 2 [{{C 2 }} cos 2 L0 P + ({S 2 }} sin 2 u) p 

-((SC))2sincj p cosa;p] (43) 



where uj p now plays the same role as <f> for circular orbits, 
and the sums over the data involve 0, rather than u)ti. For 
example, the definition of (S) in equation (20) should be 
replaced by (S) = ^2 v>i sm 

Similarly, since equations (24) and (42) are of the same 
form, the analytic integration over A and B can be ap- 
plied directly to the Keplerian case, giving V(P, e,t p \d) an- 
alytically from equations (27) to (33). As for circular or- 
bits, the distribution of velocity amplitude at each (P, e, t p ), 
V(K, P,e,t p \d), can be recovered, being well-approximated 
by equation (38). 



4.1 Calculation of V(P, K, e\d) 

For a Keplerian orbit, the radial velocity can be written 
V = 7 + K [cos(0 + io p ) + e cos uj p ] (39) 
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Figure 3. Results of Keplerian fits to the HD 4203 data from Butler et al. 2006, including a long term trend. The dotted curves show 
Gaussian distributions with central values and standard deviations matching those given by Butler et al. 2006. 100 values of K, 30 
eccentricities and 30 periods were calculated in the range shown. The contours enclose 10%, 50%, 90% and 99% of the probability. 
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Figure 2. Results of Keplerian fits to the HD 4203 data from 
Butler et al. 2006, including a linear trend. In this coarse scan 
of parameter space, V(P, e, K\d) is calculated for 10 eccentricities 
between and 0.9, 10 velocities between 1 and 217 m/s (twice 
the velocity span of the data), and 7978 periods between 1 day 
and 1996 days (the time span of the data). 



4.2 Example 

As an example, we return to the HD 4203 data considered 
previously. We first calculate V(P, e\d) for a grid in P and 
e. The integration over t p is carried out using a simple algo- 
rithm in which we double the number of equally-spaced t p 
values until the required accuracy is obtained. For each com- 
bination of P, e, and t p considered, we analytically integrate 
over 7, K, and U) p , and at the same time use equation (38) 
to keep track of P(K; P, e, t p \d). We use Newton's method 
to solve Kepler's equation, taking advantage of the fact that 
the required derivative can be calculated analytically. Our 
implementation of this algorithm takes « 5 x 1CP J s per P, 
e, and t p value considered, with 30 K values tracked through 
the calculation. For an average 200 values of t p , 10 eccen- 
tricities, and 3000 periods, the total time needed is « 30 s 
for a scan of parameter space. We have also implemented 
the grid-based approach, and find that it is about 10 times 
slower than the analytic approach. The results agree well 
between both techniques. 

The results for HD 4203 are shown in Figures 2 and 
3. We first run a coarse scan of the parameter space for 
a single Keplerian orbit plus a linear trend. We calculate 
4Nf m 8000 frequencies, corresponding to the period range 
1 day to « 2000 days (the time span of the data), 10 ec- 
centricities between and 0.9, 10 velocities between 1 m/s 
and 216 m/s (twice the velocity span of the data). The re- 
sulting constraints on P, e and K are shown in Figure 2. 
The odds ratio is 4 x 10 4 for the Keplerian orbit plus linear 
trend compared to a constant velocity model. We show the 
results including a linear trend, because the best fit model 
presented by Butler et al. (2006) includes a trend, but in 
fact our results at this stage do not require a trend. The 
odds ratio for a similar search but without the linear term 
is 5 x 10 4 . 
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We then carry out a more detailed calculation of the 
parameter space near the best fitting model corresponding 
to the peak in V(P\d) at « 440 days in Figure 2. The re- 
sults are shown in Figure 3. The odds ratio is 9.5 x 10 10 for 
a Keplerian orbit plus trend compared to a constant model 
(for the ranges of parameters shown in Fig. 3). The much 
larger value of the odds ratio compared to our coarse calcu- 
lation is because the parameter space considered is smaller 
and the peak in V(P, e, K\d) has now been resolved. We can 
rcnormalize the odds ratio to correspond to the full range of 
parameter space considered in the coarse search by multiply- 
ing by the ratio of log P2 /Pi and log K2 jK\ in each calcu- 
lation. Doing this, we find an odds ratio 7.4 x 10 7 . Without 
the linear trend the odds ratio is 100 times smaller, 7 x 10 , 
normalized to the full range of parameters. This indicates 
that a model with a linear trend is strongly preferred given 
this data. Without the linear trend, the probability peaks 
at similar values of P and K, but with a larger eccentricity, 

0.7. 

The dotted curves in Figure 3 show Gaussian distribu- 
tions with the central values and standard deviations given 
by Butler et al. (2006) for K, P, and e. Overall there is good 
agreement with the central values and widths. 

Repeating the calculation shown in Figure 3 with the 
grid-based method for marginalizing over K and lu p gives 
almost identical constraints on orbital parameters, but a 
smaller odds ratio by a factor of two, 3.5 x 10 7 compared to 
7.4 x 10 7 . We have also checked that other peaks in V{P\d) 
that can be seen in Figure 2 do not contribute significantly 
to the odds ratio. The next most important is the peak at 
P « 800 days, but its odds ratio is 400 times smaller than 
the peak at 432 days shown in detail in Figure 3. 

The coarse sampling for HD 4203 gave an odds ratio 
that was a factor of 400 smaller than the final odds ratio 
obtained by zooming in on the most significant peak. We 
find that increasing the period sampling by a factor of two 
to 8TA/ gives an odds ratio from the coarse search in good 
agreement with the odds ratio from zooming in on the peak. 



5 COMPARISON WITH PREVIOUS WORK 

In §4, we presented an algorithm that can efficiently com- 
pute P(P, K, e\d) for a radial velocity data set. As described 
in §2, this contains information about the constraints on P, 
K, and e and also allows a false alarm probability to be cal- 
culated. We now use our algorithm to recalculate results in 
the literature from MCMC and other techniques and com- 
pare. 

5.1 Orbital parameter constraints from MCMC 
calculations 

Ford (2005) used a MCMC calculation to study the con- 
straints on orbital parameters from radial velocity data, and 
this paper has been followed by several others (Ford 2006, 
2008; Ford & Gregory 2007; Gregory 2005b, 2007a,b; Balan 

6 Lahav 2009). We have calculated the constraints on or- 
bital parameters for the different single planet cases consid- 
ered in these papers, and overall the agreement is excellent. 

One difference is that in several published cases, the 
posterior probability for eccentricity drops towards zero at 
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Figure 4. The eccentricity distribution derived for HD 76700, 
using data from Tinney et al. 2003. The solid curve is for an- 
alytical marginalization over the noise scaling parameter k, the 
dotted curve is for k = 1, and the dashed curve shows eP(e\d), 
corresponding to a uniform prior in d(ecosoj p )d(esmio p ). 

low eccentricities, whereas we find V(e\d) is approximately 
constant as e goes to zero. For HD 76700, this difference 
appears to be because of the different prior assumed by Ford 
(2005). The MCMC calculations in that paper take steps in 
e cos u>p and e sin uo p in such a way that the assumed prior is 
uniform in d(ecosuj p )d(esmu>p) giving a prior ededui p oc e. 
In Figure 4, we allow for this different prior by plotting 
eP(e\d), and the result compares favorably with Figure 2 
of Ford (2005). (Ford 2005 discusses the use of importance 
sampling, in which the samples are weighted oc 1/e to give 
an effective prior uniform in e, but this does not seem to 
have been applied in Figure 2 of that paper). 

For HD 72659, marginalization over the extra noise 
source opens up considerable parameter space at low eccen- 
tricity. In Figure 5, we show the constraints on eccentricity 
and period with k fixed at k = 1 and with k marginalized 
over. Ford (2005), unlike later papers (e.g. Ford 2006) does 
not include an additional noise term, and our results for 
k = 1 compare well with Figures 4 and 5 of that paper. 

5.2 Odds ratios from Gregory's parallel 
tempering MCMC approach 

In a series of papers, Gregory has developed a MCMC code 
which uses parallel tempering to exchange information be- 
tween chains running with different "temperatures". Com- 
bining the results of different chains gives the total posterior 
probability for the model, allowing calculations of odds ra- 
tios and therefore model comparisons. 

Gregory (2005b) analyzed 18 radial velocities for 
HD 73526 from Tinney et al. (2003). The period range was 
from 0.5 days to 3732 days, and velocities from to 400 m/s 
using a Jeffrey's prior with a break at 1 m/s. An additional 
noise term was added which was allowed to range between 
and 100 m/s. He pointed out that there were two addi- 
tional possible solutions with P 128 and 376 days besides 
the previously obtained solution at P ~ 191 days. A chain 
covering the entire parameter space did not converge, and 
so separate chains were run focussing on each of the three 
probability peaks. The odds ratio for a planet compared to 
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Figure 5. The eccentricity distribution and joint eccentricity- 
orbital period distribution derived for HD 72659. The top pan- 
els are for analytic marginalization over the noise parameter k, 
whereas the bottom panels take k = 1 as in Ford 2005. Contours 
enclose 10, 50, 90 and 99% of the probability. 

a constant velocity was found to be 9.3 x 10 5 (Table 5 of 
Gregory 2005b). We ran a calculation with the same period 
and velocity range as Gregory (2005b) (except that we take 
the lower bound in K to be 1 m/s with a Jeffrey's prior) 
(9940 frequencies, 30 eccentricities and 30 velocities). The 
odds ratio was 3.3 x 10 6 . Zooming in on the three peaks 
gives probability distributions for e, P, and K that are very 
similar to the results of Gregory (2005b). The odds ratios for 
the P w 128, 190, and 376 day peaks are 2.4 x 10 4 , 1.1 x 10 5 , 
and 1.0 x 10 (assuming the full prior range so that these 
numbers can be compared). The sum of these, 1.1 x 10 6 
agrees well with the odds ratio found by Gregory (2005b) 
whose odds ratio includes only these three peaks. The rela- 
tive probabilities of the three peaks are 2%, 10% and 88%. 
Gregory (2005b) found relative probabilities of 4%, 3% and 
93%. 

Gregory (2007a) found evidence for a second planet in 
HD 208487; we compare to their odds ratio and posterior 
probability for a one-planet fit. The posterior probability 
distributions were calculated for the 35 velocities from But- 
ler et al. (2006). We find excellent agreement with the distri- 
butions of P, e and K shown in Figure 7 of Gregory (2007a). 
The odds ratio for a single planet model for this data (Ta- 
ble 6 of Gregory 2007a) was 1.7-2.6 x 10 4 for two different 
choices of the turnover in the modified Jeffrey's prior for 
the extra noise scale. For the parameter ranges in Figure 
7 of Gregory (2007a), we find an odds ratio of 1.4 x 10 s . 
Rescaling to a velocity range 1-2129 m/s, and period range 
1 day to 1000 years, this becomes 6.1 x 10 4 , a factor of 3 
times greater than Gregory (2007a) . (The details of the pri- 
ors were different, for example, the upper limit on velocity in 
Gregory 2007a's prior depended on period and eccentricity, 
but we expect this to give only a small difference). 

Gregory (2007b) presented evidence for three planets 
in HD 11964 from 87 radial velocities in the Butler et al. 
(2006) catalog. The odds ratio reported for the single planet 
model is 3 x 10 9 (Table 4 of Gregory 2007b) . We find an odds 
ratio in good agreement, 2 x 10 9 . Although Gregory (2007b) 
does not show posterior probability distributions for orbital 
parameters for the one planet model, the distributions of P, 



e and K we find compare well with those for the P ~ 2000 
day signal in the three planet model of Gregory (2007b). For 
this data, Butler et al. (2006) include a linear term. We find 
the odds ratio for a linear versus constant no-planet model 
to be 1300. Including a linear term in the planet model gives 
an odds ratio of 3 x 10 6 , much smaller than the odds ratio for 
a planet model only. Therefore, we find that a single planet 
model with P ~ 2000 is preferred over a linear trend only 
or planet plus linear trend by a large factor (in agreement 
with Wright et al. 2007 who also concluded that the trend 
reported by Butler et al. 2006 was likely spurious). 

5.3 False alarm probabilities 

Marcy et al. (2005) discuss the calculation of false alarm 
probabilities using a scrambled velocity method in which 
the residuals to the best-fitting Keplerian orbit are used as 
an estimate of the noise distribution. In that paper, they 
announced five new planets from the Keck Planet Search. 
False alarm probabilities were calculated for two cases that 
looked marginal, HD 45350 (FAP< 0.1% scrambled, 4x 10~ 5 
F-test) and HD 99492 (FAP^ 0.1% scrambled, 3 x 10~ 4 F- 
test). For HD 99492, we find odds ratios scaled to 1.0 for 
no planet are 0.33 for a linear trend but no planet, 1.66 for 
a planet, 200.0 for a planet plus linear trend. Therefore, a 
linear trend is preferred in this case. The FAP using equation 
(13) for the odds ratio is 7 x 10~ 3 . For HD 45350, we find 
odds ratios: 1.0, 0.18, 6.7x 10 5 , 4.7x 10 5 , giving FAP^ 10" 6 . 
As Marcy et al. (2005) noted, the evidence for a linear trend 
in this source is marginal (the odds ratios are similar with 
and without a trend). 

Cumming (2004) described a quick estimate of the FAP 
based on an F-test at each independent frequency. General- 
izing the Lomb-Scargle periodogram to eccentric orbits, the 
idea is to define a power at each frequency 

z= (iv-5)A x 2 = (jv-sXxg-xLp) ^ (44) 

4Xkc P 4x Kcp 

For Gaussian noise, z follows the F^at-s distribution 8 , which 
allows a calculation of Prob(z > z max ) for an observed max- 
imum power Zmax- The FAP is then 

FAP = l-(l-Prob(2 > w))"' « iV/Prob( 2 > w)-(45) 

The number of independent frequencies Nf can be estimated 
as N f w TAf. 

We have used this approach to calculate the FAP for the 
84 stars with published radial velocities as part of the Butler 
et al. (2006) catalog of exoplanets. To find Xkc P > we follow 
the automated procedure used by Cumming et al. (2008), 
which involves using the top two well-separated peaks in 
the Lomb-Scargle periodogram as starting periods for full 
Keplerian fits. To compare with the Bayesian odds ratios, 
we convert the F-test FAP into an odds ratio by inverting 
equation (9). To find Bayesian odds ratios, we run a coarse 
sampling of the parameter space with 8TA/ periods for each 
of these 84 stars, with and without a long term linear trend. 

8 Assuming that the no planet model being compared to is a 
constant velocity model. If a linear trend is included in the no 
planet model and the planet model, z is defined with a factor of 
N — 6 replacing TV — 5, and then follows the i^jy-e distribution. 
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Figure 6. Comparison between logarithm of the odds ra- 
tio log 10 A from the Baycsian calculation and the F-test. The 
Bayesian odds ratios are from coarse sampling (8TA/ periods, 
10 eccentricities) of the 84 radial velocity data sets from Keck, 
Lick, and AAT published as part of the Butler et al. 2006 catalog. 
We compare with analytic F-test FAPs, converted to odds ratios 
using the relation A = (1/F) — 1. The crosses are the odds ra- 
tios for a linear trend versus constant velocity, the diamonds are 
odds ratios for a planet versus constant, and the triangles are for 
planet plus long term trend versus long term trend only. The up- 
per panel uses a Keplerian fitting routine to determine the F-test 
FAP, whereas in the lower panel we use the minimum \ 2 found 
in the Bayesian routine to calculate the analytic FAP. 



The results are shown in Figure 6. In the lower panel, 
we use the minimum value of \ 2 found in the Bayesian cal- 
culation to calculate the F-test FAP. In this case, the odds 
ratios are well-correlated, although with the Bayesian odds 
ratio between 1 and 1000 times smaller than the F-test odds 
ratio. In the upper panel, there is more scatter. This arises 
from differences between the minimum % 2 values found by 
the Keplerian fitting routine and the Bayesian routine. For 
example, the two points above and to the left of the upper 
panel of Figure 6 are for HD 80606, which has a very eccen- 
tric orbit. Our Keplerian fitting routine, which uses circular 
orbit fits as its starting point failed to find the best-fitting 
solution, whereas the Bayesian routine, with its systematic 
scan of parameter space did find it. Generally the scatter is 
downwards, indicating that the Bayesian routine sometimes 
find a larger minimum x 2 than the Keplerian fitting routine. 



Likely this is due to the finite period sampling, whereas the 
Keplerian fitting routine can adjust the period to lower x' ' ■ 
The fact that the Bayesian odds ratios tend to be lower 
than the F-test odds ratios indicates that the Bayesian cal- 
culation is more conservative than the F-test. In fact, this is 
expected. Cumming (2004) showed that the Bayesian odds 
ratio is closely related to the F-test (periodogram), but with 
a different definition for the number of independent frequen- 
cies. In the Bayesian calculation, the number of trials counts 
the frequencies, but also the range of the other parameters 
(Cumming 2004). In this way, the Bayesian calculation pe- 
nalizes models with larger ranges of parameters, for all pa- 
rameters, not just frequency. 



5.4 2D periodograms 

Wright et al. (2007) investigate the constraints that can 
be placed on the orbital parameters of long period orbits 
that have been only partially observed. They calculated the 
minimum \ 2 &t points across the m sin i-P plane. Similarly, 
O'Toole et al. (2008) introduced a "2D Keplerian Lomb- 
Scargle periodogram" (2DKLS) in which the periodogram 
power is evaluated on a grid of P and e, with a full Keplerian 
fit carried out at each point. O'Toole et al. (2008) discuss 
the considerable computing resources being used to conduct 
simulations of detectability using this new 2D periodogram. 
The techniques we discuss earlier for rapid evaluation of mul- 
tiple x 2 values could prove useful in more efficiently evalu- 
ating the 2DKLS periodogram. The constraints on P-e or 
P-K calculated in this paper differ from Wright et al. (2007) 
and O'Toole et al. (2008) in that for each choice of P, e or 
P, K all values of the other parameters are taken into ac- 
count, weighted by their probability, rather than finding the 
best fit values of the other parameters. This is the standard 
difference between Bayesian and frequentist approaches. 

O'Toole et al. (2008) mention that one of the reasons 
for looking at the periodogram power as a function of P and 
e is to help with detection of highly eccentric orbits. They 
consider the e = 0.97 planet around HD20782 as an example. 
Their best fit has e = 0.97±0.01, P = 591.9±2.8, and K = 
185.3 ±49.7. Our results for this data are shown in Figure 7. 
The discrete nature of the K distribution is due to the finite 
sampling of the grid in eccentricity. The O'Toole et al. (2008) 
solution lies on our contours, but towards the edge. The 
Bayesian calculation, which averages over the marginalized 
parameters, opens up a wider parameter space than the best- 
fit and error bars from O'Toole et al. (2008) suggest. 



5.5 HD 5319 

HD 5319 has a planet with minimum mass 1.9 Mj in a 675 
day low eccentricity orbit (Robinson et al. 2007). This is 
an interesting example to compare to because the analysis 
of Robinson et al. (2007) used several different statistical 
methods. First, they used Monte Carlo simulations of data 
sets with noise only (simulated by selecting with replace- 
ment from the observed velocities) to assess the FAP, find- 
ing 1.3 x 10 -3 . They used both a scrambled velocity Monte 
Carlo simulations and an MCMC Bayesian calculation to es- 
timate the uncertainties in the derived orbital parameters. 
They used an F-test to test the significance of including a 
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Figure 7. Results of Keplerian fits to the HD 20782 data from O'Toole et al. 2008. 30 eccentricities, 30 periods and 100 K values were 
calculated in the ranges shown. Contours enclose 10%, 50%, 90%, and 99% of the total probability. 





Figure 8. Results of Keplerian fits to the 5319 data including a linear trend. 30 eccentricities, 30 periods and 100 K values were calculated 
in the ranges shown. Dotted curves show Gaussian distributions with central values and standard deviations taken from Robinson et al. 
2007. Contours enclose 10%, 50%, 90%, and 99% of the total probability. 



linear trend in their model, finding a FAP of 3 x 10 4 indi- 
cating that a linear term is strongly preferred. 

The results of our calculation are shown in Figure 8. 
The dotted lines show the best fitting parameters and the 
errors found by Robinson et al. (2007), assuming Gaussian 
distributions, and agree well both in terms of central val- 
ues and widths. Interestingly, the MCMC simulations run 
by Robinson et al. (2007) did not agree as well with their 
scrambled velocity approach, whereas we find good agree- 



ment. The odds ratio for a trend in the no planet model is 
0.9. For models with a planet, the odds ratios are 9.0 x 10 s 
(with trend) and 1.0 x 10 6 (without trend). The model with 
a trend therefore has greater odds by a factor of 10 3 , in good 
agreement with the F-test FAP of 3 x 10 -4 found by Robin- 
son et al. (2007). However, the overall false alarm probabil- 
ity we find ~ 10 -9 is much smaller than the simulations of 
Robinson et al. (2007) suggested, ~ 10~ 3 . 
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Figure 9. Comparison between the 99% upper limits on K for 
circular orbits determined by Cumming et al. 1999 for 63 stars 
from the Lick Planet Search, and the 99% upper limits on K 
from a Bayesian analysis of the same data. Solid triangles or cir- 
cles include a linear trend in the fits (these are the 7 stars that 
Cumming et al. 1999 found to have a significant slope), whereas 
open triangles or circles do not include a linear trend. The dotted 
line indicates a 1:1 correspondence between the two calculations 
of the upper limit. The black triangles are for circular orbits, the 
red circles are for eccentric orbits with e < 0.5 and the green 
circles are for eccentric orbits with e < 0.7. 

5.6 Upper limits on K 

In an analysis of the Lick Planet Search, Cumming et 
al. (1999) calculated upper limits for 63 stars with non- 
detections. They used a Monte Carlo approach, in which 
simulated data sets with a circular orbit plus noise were 
analyzed and the velocity amplitude determined which re- 
sulted in detection 99% of the time. We have reanalyzed the 
same data using our Bayesian scheme, first with circular or- 
bits, and then with eccentric orbits. We calculate the 99% 
upper limit by f* 99 dK V{K\d) = 0.99 (where V(K \d) 
is normalized so that the total probability is unity). 

The results are shown in Figure 9. The triangles are 
for circular orbit models, and the circles are for eccentric 
orbits. For the 7 stars found to have a significant linear 
trend by Cumming et al. (1999), we include a linear trend 
in the model. Overall the agreement is good. Cumming et 
al. (1999) (and Cumming et al. 2008) calculate upper limits 
for circular orbits to reduce the computational time needed. 
Based on the calculations of the effect of eccentricity on de- 
tectability of Endl et al. (2002) and Cumming (2004), they 
proposed that Kgg for circular orbits would be a good esti- 
mate of -K99 for orbits with e < 0.5. We can test that here 
by calculating Kgg from the partial K distribution 

/" e cutoff 

V{K\d) = de V{e,K\d) (46) 

Jo 

with different cutoffs e C utoff- For e cu toff = 0.9, we find that 
the value of Kgg is generally much greater than Kgg for 
circular orbits, due to a tail of large K, large eccentricity 
solutions. However, for e cuto ff = 0.5, the agreement is very 
good. This is shown in Figure 9, where we show results for 
e cu toff = 0.5 (red symbols) and e cuto ff = 0.7 (green symbols). 
The e C utofi = 0.7 values of Kgg are significantly greater than 
the circular orbit or e C utoff = 0.5 values. 



6 SUMMARY AND CONCLUSIONS 

In this paper, we consider Bayesian analysis of radial ve- 
locity data. An advantage of this kind of analysis over tra- 
ditional methods is that a single calculation gives the false 
alarm probability and the probability distributions of or- 
bital period, eccentricity and velocity amplitude, allowing 
error bars or upper limits on these quantities to be deter- 
mined. Using periodogram methods, separate calculations 
are required for each of these quantities, typically requiring 
many Monte Carlo trials. 

Previous work on Bayesian analysis of radial velocities 
has used Markov Chain Monte Carlo (MCMC) techniques 
(although see Ford 2008 who used analytic techniques to 
partially carry out the marginalization for circular orbits). 
Our approach has been to apply some exact and approx- 
imate analytic results (based on previous work by Jaynes 
1987 and Bretthorst 1988) to the marginalization integrals 
for Keplerian fits to radial velocity data. In particular, we 
analytically integrate over the linear model parameters for 
each combination of P, e, and t p , and use an analytic ap- 
proximation (eq. [38]) to reconstruct the probability distri- 
bution of K. An implementation of this algorithm in IDL is 
available on request from the authors. 

With this approach, a full search of parameter space 
for a single Keplerian orbit takes several minutes on a 2- 
3 GHz processor, or several seconds for circular orbits, mak- 
ing it applicable to data sets from large velocity surveys. 
Constraints on orbital parameters (which involve surveying 
smaller regions of parameter space) can be calculated in sec- 
onds, competitive with MCMC techniques 9 . Our calculation 
can certainly be improved further. For example, we have fo- 
cussed on the marginalization over the linear parameters in 
this paper, and used the simplest approach of evaluation on 
an evenly-spaced grid to integrate over the remaining pa- 
rameters P, K, and e. 

We compared our results with previous calculations. 
The constraints on orbital parameters and odds ratios agree 
well with MCMC results. We find that the Bayesian odds 
ratios are systematically lower than F-test odds ratios by a 
factor between 1 and 1000. This is due to the different ac- 
counting of trials in the two calculations (Cumming 2004), 
with the Bayesian calculation including an Occam's razor 
penalty which accounts for the range of all parameters rather 
than only the frequency range. The techniques we have de- 
veloped for rapidly calculating \ 2 ma y have application to 
other techniques, such as the 2D periodograms of Wright et 
al. (2007) and O'Toole et al. (2008). We find good agreement 
with the upper limits on velocity amplitude K calculated for 
circular orbits by Cumming et al. (1999) if we restrict our 
attention to e < 0.5. More eccentric orbits give rise to a tail 
of solutions at large K. This shows that characterizing the 
K distribution with a single parameter (e.g. the 99% upper 
limit; Cumming et al. 2008) is not appropriate for popula- 



9 In Ford (2006), the computer time needed was 
~ 10~ e s N b s NpL c N c where 7V ODS is the number of ob- 
servations, JV P the number of planets, L c the length of each 
chain, and N c the number of chains considered. For 30 obser- 
vations, 1 planet, 10 chains each of length 10 4 (multiple chains 
are required to assess convergence; Ford 2006), the total time 
required is si 3 s. 
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tion analyses with highly eccentric orbits included. On the 
other hand, for low to moderate eccentricity orbits (e < 0.5), 
upper limits can be derived from circular orbit fits which is 
much less numerically intensive. 

The division of Keplerian parameters into "fast" and 
"slow" may prove useful in MCMC simulations. At the least, 
the systemic velocity does not need to be included as a pa- 
rameter; it can be quickly evaluated for each set of the other 
parameters, and used to evaluate \ 2 (this was also noted by 
Ford 2006). One possible complication is that Ford (2005) 
takes steps in a mixture of fast and slow parameters, e cos to 
and esinw, to help speed convergence. Separating the slow 
and fast parameters could potentially reduce efficiency in 
this case. Further investigations are needed. 
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APPENDIX A: INCLUDING A LINEAR TERM 
(LONG TERM TREND) 

In the main text, the "no planet" model that we have com- 
pared the sinusoid and Kepler fits to was a constant velocity 
model. Often, a linear term is included in the fit to account 
for long timescale trends in the data. Since adding a linear 
trend adds one extra linear term to the model, we can an- 
alytically marginalize over the slope in the same way as we 
marginalize over the constant term. In this Appendix, we 
give the formulae to do that. 



Al Is there evidence for a long term trend? 

First, consider a constant versus a linear model. Minimizing 
X 2 as a function of 7 for Vi = 7, we find the best fit constant 
term is 



7o = (v), 

the corresponding minimum value of \ 2 is 

{{v 2 }), 



and 

det a = Wi 



(Al) 
(A2) 
(A3) 



Inserting these expressions into equation (26) with the num- 
ber of parameters m — 1 gives the posterior probability for 
a fit of a constant. 

For a straight line fit, Vi = 7 + fiU, we find 



7o = 



{v){t 2 )-{vt){t) 

«* 2 » 



(A4) 
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A> = 



det a 

(E^F 



xL 



((yt)) 
«* 2 » 

{{t 2 }) 

{v 2 ) -2yo{v) -2po{vt) 
+70 +2mA)7o(t) + A?<* 2 > 



(A5) 
(A6) 

(A7) 



Using equation (26), the odds ratio is 



A = 



(xL 



-(JV-2)/2 



(Xconst) 



-(N-l)/2 



1/2 



(E^)«* 2 », 

r((iV-2)/2) 1 
r((JV-l)/2) A/3' 



(A8) 



where A/3 is the prior range for (3 (the prior range for 7 
is the same in both models, and cancels). Here, we take /3 
to lie between ±Av/T — ±(u max — Vmi n )/T, giving A/3 = 
2Av/T, that is we use the range of velocity amplitudes that 
we consider and the time of the observations to set the range 
of slopes. 

There is an important issue to mention here (we thank 
the referee for raising it), that the prior range of parame- 
ters should not depend on the data (the prior probability 
should reflect our state of knowledge before the data were 
taken). That is not true here since the range of observed 
velocities is used to determine what range of velocity am- 
plitudes to search. Strictly, the normalization of the prior 
should not reflect this but be completely independent of the 
data. For example, the range of slopes could be set by look- 
ing at the range of slopes in previous planet discoveries (for 
example, in Butler et al. 2006 the reported slopes extend 
to ps 100 m s _1 yr _1 ), or the range of velocity amplitudes 
extend up to a maximum set by the amplitude induced by 
a pb 10 Mj companion Gregory (2005b); Ford & Gregory 
(2007). However, the final odds ratios are not very sensi- 
tive to the exact choice of prior range. The range of velocity 
amplitudes enters the normalization logarithmically (since 
the prior is taken to be uniform in log). The range of slopes 
has the largest effect since it enters linearly, but we find 
that using a different choice, e.g. a range of /3 from —100 
to +100 m s _1 yr~ changes the odds ratios by factors of a 
few to several only. 



A2 Including a trend in the circular or Keplerian 
orbit fit 

Consider the model 

Vi = 7 + PU + A sin 6i + B cos 6 t ( A9) 

where 8i = 2-Kti/P for a circular orbit fit. Minimizing x 
with respect to the four parameters 7, (3, A, B, we find that 
their best fit values can be written in a concise way by defin- 
ing a new average 

{(xt)){{yt)) 



xy = {{xy)) . 

Using this notation, 

70 = (v) - (3 (t) - A(S) - B(C) 
{{vt))~A{{St))-B{{Ct)) 



A) = 



«* 2 » 



(A10) 

(All) 
(A12) 



Bo = 



C 2 S 2 
^S 2 - 



-SC 
^SC 



(A13) 



(A14) 



c 2 s 2 - sc 

The expressions for Ao and Bo are the same as previously, 
but with the new averages. We also find 

det a 



«* 2 » 



S 2 C 2 - sc) 



(A15) 



and 

xl 



-- {{v 2 ))-2(A {{vS))+B {{vC))) 

+A 2 ((S 2 }} + B 2 ((C 2 }} + 2A B ((SC)) 

-0o«* a » 
= v 2 -2 ( Ao vS + B vC) 

+A 2 S 2 + B 2 C 2 + 2A Bo SC. (A16) 

Equations (All) to (A16) replace equations (28) to (32) 
when a long term trend is included. They are essentially 
the same, but with the average xy used instead of {{xy)). 
Equation (26) with m = 4 then allows marginalization over 
the four parameters A, B, /3 and 7. 

Similarly, for the grid based approach, the expression 
for x 2 is °f the same form as equation (21), but with the 
averages calculated as xy instead of {{xy)). 



APPENDIX B: LIKELIHOOD FOR FIXED 
NOISE SCALING PARAMETER K 

In the main text, we integrated over the noise scaling param- 
eter k, giving the likelihood in equation (6) (t-distribution) 
rather than equation (2) (exponential). As we argued in §2.3, 
the analytic marginalization over an infinite range of k is a 
good approximation for a reasonable spread in k. However, 
it could be that we are able to predict k quite accurately, 
for example, if the level of stellar jitter has been predeter- 
mined for a particular star, in which case we might want to 
carry out a calculation for fixed k. Also, this would allow a 
calculation of the posterior probability for k. 

For fixed k, we have V{d\a) oc k~ N exp(-x 2 (a)/2fc 2 )- 
The normalization over the constant term is then, taking 
circular orbits as an example, 



P(d\<(>,K, P) oc 



dj k exp 



X 

2k 2 



(Bl) 



where x 2 (7) has the quadratic form of equation (15). There- 
fore we can take 



V{d\4>,K, P) = k 



-(JV-l) 



exp 



X 2 hoA,K,P] 
2k 2 



(B2) 



as a replacement for equation (19), where we set the prefac- 
tor to unity as it cancels when we form the odds ratio. 

Similarly, the analytic result giving marginalization over 
m parameters for a general linear model (eq. [26]) becomes 



/ 



dak exp 



(2n) m/2 k- 



(N-m) 



Vdeta 



exp 



X 

2k 2 

A 

2k 2 



(B3) 
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As a check, marginalization over k at this stage takes us 
back to equation (26). 
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